

# Estimating Beam Time for Heavy Ion Single-Event-Effects (SEE) Testing (and a few hints on efficiency)

*Kenneth A. LaBel*

*SSAI, Inc., in support of NASA-GSFC*

[kenneth.a.label@nasa.gov](mailto:kenneth.a.label@nasa.gov)



# Outline

- Abstract
- Types of SEE Tests
  - Requirements drive test performance
- Know Your Device
  - Device complexity, operational scenarios, and event signatures
- Test System Features and Operational Planning
  - Test Plan and Flexibility are Crucial
- Arriving at the Facility
- What to Know about Beam Selection
- Efficiency Considerations
  - Test Set
  - Test Performance
  - Beam/Parameter Changes
  - Enter/Exit the Chamber
- Debug Considerations and Backup Plans
- Rules of Thumbs?
- Final Thoughts



**Ken's first CPU!**

## 650x Processor

- 8 um feature size (not a typo) – ~1975
  - » 8-bit CPU
  - » Up to 14 MHz
  - » 64 KB RAM
  - » 256 bytes stack
  - » No I/O ports
  - » 28 or 40-pin DIP

# Abstract

- While improvements to heavy ion beam capacity are underway at multiple sites, time is still expensive with high levels of competition to obtain.
- This presentation provides three critical items:
  - Considerations for estimating the amount of time needed to perform a test for scheduling purposes,
  - Hints for making efficient use of the time at the beam, and,
  - Having a debug/backup plan.
- This is not intended to provide a specific answer for a specific device, but keys to successful return on investment (ROI).

# BLUF – The Time It Takes

- It all boils down to
  - Time with beam on
    - » Length of test runs
    - » # of test runs and test conditions to meet requirements
  - Time with beam off
    - » Set up and tear down times (if included in the time slot)
    - » Time between test runs
    - » Time to change test samples
    - » Time to change ions/beam parameters
    - » Debug or change of part type
- The real goal is simple: maximize beam on and minimize beam off



**Do you need a full curve to meet your requirement?**

Courtesy M. Casey, NASA

# KNOW THY TEST PURPOSE

# Not All SEE Testing is Done with the Same End Goal



***Point is that ALL tests have a goal/requirement and how much data/time will vary***

# The SEE Test “Space” Will Vary by the Test Goals

- The test parameter/run space is based on the test goals. Examples only:
  - Generic Test
    - » A product qualification test for a Mil/Aero product.
    - » Usually provides worst-case information for destructive events, but limited information for non-destructive (corner-cases/nominal only, for example): May require additional application-specific testing for missions.
  - Mission Application
    - » SEE rates for availability/reliability for that mission environment
    - » Event signature capture for mission mitigation design, ...
  - Characterization
    - » Technology or architecture research
  - Go/No-Go or Downselect
    - » Pass/Fail: do you see an event at a given LET or not?
    - » Testing multiple vendors and picking best performing for more complete testing
    - » Destructive pass/fail at a given LET and test condition
  - RLAT (Qualification)
    - » Testing to ensure the specific lot meets acceptance criteria
  - System/Assembly Level (or System on a Chip – SOC,...)
    - » Mitigation validation
    - » Dominant failure mode identification,...



***Coupling test goals with device complexity are prime in estimating time needed***

# KNOW YOUR DEVICE



# “Simple” Context for Device Complexity and SEE

- So, what happens to the device under test (DUT) when the beam is on?
  - Nothing (is the cap off the end of the beam line? ☺)
  - One type of upset event signature: transient, cell state change aka bit flip, ...
  - A myriad of event signatures – bit flips and single event functional interrupts (SEFIs), state changes, interrupts,...
- Ignoring the nothing, the single event signature type provides a **homogeneous** set of test data that is usually easily graphed on a curve
- The myriad of event signatures provides a **heterogeneous** data set with each event type needing analysis separately (unless requirement only cares about “blue screen of death” (BSOD) or similar). Note different signatures may begin at different LETs.



**Homogeneous data is  
easily graphed**

Courtesy M. Casey, NASA

# For SEE Testing, Start with a Datasheet

- In all truth after searching for any existing data on the product or family, for initial planning, using the functional block diagram is a good start
  - Review each functional block
  - Determine potential SEE types
    - » Upset (SEU), transient (SET), stuck bit, ...
  - Estimate error propagation and signatures
- Next step is to figure out data capture
  - How will you observe the event?
  - Considerations for event recovery

**It's important to understand limitations of data capture as well as "dominant effects" (usually, the large physical blocks within a device like memory arrays).**

**The occurrence of dominant effects may hide (mask) other effects during a test run due to the accelerated nature of the beam.  
(more on this later)**



[https://www.xilinx.com/support/documentation/data\\_sheets/ds190-Zynq-7000-Overview.pdf](https://www.xilinx.com/support/documentation/data_sheets/ds190-Zynq-7000-Overview.pdf)

# Sample block analysis and post-test recommendations

| Chip Area          | SEE Issue                                                                                                  | Possible SEU Mitigation                                                                                                                             |
|--------------------|------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Config. Memory     | Single and multiple bit errors corrupting circuit operation, causing bus conflicts (current creep), etc... | <ul style="list-style-type: none"> <li>• Scrubbing</li> <li>• Partial reconfiguration</li> </ul>                                                    |
| Config. Controller | Improper device configuration can occur if hit during configuration/reconfiguration                        | <ul style="list-style-type: none"> <li>• Partitioned design</li> <li>• Multiple chip voting (Redundancy by using multiple devices)</li> </ul>       |
| CLB                | Logic hits and propagated upsets caused by transients                                                      | <ul style="list-style-type: none"> <li>• Triple modular redundancy (TMR) (or Xilinx TMR – XTMR)</li> <li>• Acceptable error rates</li> </ul>        |
| BRAM               | Memory upsets in user area                                                                                 | <ul style="list-style-type: none"> <li>• TMR</li> <li>• Error Detection and Correction (EDAC) scrubbing</li> </ul>                                  |
| Half-latches       | Sensitive structure used in configuration/routing                                                          | <ul style="list-style-type: none"> <li>• Removal of half-latches from design</li> </ul>                                                             |
| POR                | SEUs on POR can cause inadvertent reboot of device                                                         | <ul style="list-style-type: none"> <li>• Multiple chip voting (Redundancy by using multiple devices)</li> </ul>                                     |
| IOB                | SEUs can cause false outputs to other devices or inputs to logic                                           | <ul style="list-style-type: none"> <li>• Leverage Immune Config. Memory cell</li> <li>• Evaluate input SET propagation</li> </ul>                   |
| DCM                | Can cause clock errors that spread across clock cycles                                                     | <ul style="list-style-type: none"> <li>• TMR</li> <li>• Temporal TMR</li> </ul>                                                                     |
| DSP                | Hard IP that is unhardened that can cause single event functional interrupts (SEFIs) or data errors        | <ul style="list-style-type: none"> <li>• TMR</li> <li>• Temporal TMR</li> </ul>                                                                     |
| MGT                | Gigabit transceivers. Hits in logic can cause bursts or SEFIs. O/w bit errors in data stream               | <ul style="list-style-type: none"> <li>• TMR</li> <li>• Protocol re-writes</li> </ul>                                                               |
| PPC                | Hard IP that is unhardened. SEFIs are prime concern                                                        | <ul style="list-style-type: none"> <li>• TMR or software task redundancy</li> </ul>                                                                 |
| SEL                | Higher current condition that is potentially damaging                                                      | <ul style="list-style-type: none"> <li>• No mitigation other than substrate addition (epi).</li> <li>• Circumvention techniques possible</li> </ul> |

LaBel, GOMAC 2007

# But What Else Can I Use to Pre-Plan SEE Tests?

- The idea is to review the **factors that can affect SEE** for the device under test (DUT) through data diving (aka similarity)
- The table illustrates some characteristics that may affect SEE sensitivity and signature types (though significance varies by device)
- The idea is to find part info (foundry, technology,...) and similarity data (family, architecture, ...) and utilize for estimates for beam parameters (ions/LETs/flux) and test system operation, aka data capture (event signatures, rate of capture,...)

| Characteristics                                               | Descriptions                                                                                                                                                                                                                         |
|---------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>Foundry</b>                                                | Manufacturer of the active semiconductor portion of the device. Example, GlobalFoundries. The "same" product built at different foundries may have significantly different SEE characteristics.                                      |
| <b>Process</b>                                                | Technology and specific fab process within a foundry/manufacturer. Ex., bipolar technology built on XKQD process. May eliminate or add some SEE concerns.                                                                            |
| <b>Feature Size</b>                                           | Geometric transistor/cell size or similar. How big individual targets are for SEE ion strikes. More of a cross-section than threshold issue.                                                                                         |
| <b>Wafer/lot/package</b>                                      | Potential known variance by lot, wafer, etc... of a product. Usually not a dominant contributant to affect threshold/cross-section, but has been observed.                                                                           |
| <b># of transistors/cell/etc</b>                              | # of potential targets for ion strikes. Usually more of a cross-section than threshold concern.                                                                                                                                      |
| <b>Die size</b>                                               | Target area for SEE risk. Usually affects cross-section more than threshold.                                                                                                                                                         |
| <b>Family</b>                                                 | Is there any known SEE sensitivity/data on a specific manufacturer's product family?                                                                                                                                                 |
| <b>Architecture</b>                                           | Is there related information on any parts with similar architecture? Consider for example, Buck Regulator architecture for power conversion. Types of SEE event signatures may be gleamed.                                           |
| <b>Functional Blocks/IP</b>                                   | Have devices with similar IP been tested (or perhaps partial on device of interest)?                                                                                                                                                 |
| <b>Operating characteristics (frequency, voltage, etc...)</b> | How much does the specific operating conditions affect the SEE response? Simple examples: dV for transients in an op amp or frequency for SET capture in a shift register string. Application specific test needs versus data found. |
| <b>Other</b>                                                  | Specific device types or technologies may have additional considerations.                                                                                                                                                            |

# To Clarify

- We do these things to
  - **Estimate Error Signatures that the test set needs to capture,**
  - **Anticipate Event Rates to set data capture rate capabilities and beam flux, and,**
  - **To maximize efficiency (time lost) for Event Recovery.**
    - » The point is to return to a known state in a deterministic manner after the event to allow the test run to continue.
    - » Keep in mind that you'll need to factor in the beam time on/off to normalize results.

# Remember: Complete Testing is a Fallacy

- Lots of possible test modes, conditions, patterns, etc. as seen to the right
- Combining test goals with device complexity and application coverage drive efficient beam usage

*Can we test anything completely?*



Sample Single Event Effect Test Matrix

*full generic testing*

| Amount | Item                                |
|--------|-------------------------------------|
| 3      | Number of Samples                   |
| 68     | Modes of Operation                  |
| 4      | Test Patterns                       |
| 3      | Frequencies of Operation            |
| 3      | Power Supply Voltages               |
| 3      | Ions                                |
| 3      | Hours per Ion per Test Matrix Point |

66096 Hours

2754 Days

7.54 Years

*and this didn't include temperature variations!!!*



Commercial 1 Gb SDRAM  
68 operating modes  
operates to >500 MHz  
Vdd 1.8V external, 1.25V internal

Test planning requires much more thought in the modern age  
as does understanding of data collected (be wary of databases).  
Only so much can be done in a 12 hour beam run – application-oriented

Scaled CMOS Test Challenges – Presented by Kenneth A. LaBel, GOMAC Conference, Orlando, FL 3/22/07

11



# TEST SYSTEM CONSIDERATIONS

# This Isn't a Presentation on How to Build a Test Set or Develop a Plan, But....

- There are a few points that should be deliberated as part of the efficiency discussion
  - Data capture observability: what the test system can or can't capture as well as timing of event occurrence vs observation
  - Data capture rate: how fast the system can observe events without being overwhelmed (i.e. events/sec or similar). Think of how this influences flux rate selection.
  - Event recovery to a known state
    - » How fast to recover/flush and implications to fluence interpretation (i.e., did the device reboot and what do you do with the fluence during the reboot time)
  - Flexibility
    - » Events don't always look like their expected to. Test sets need to be flexible to accommodate.

# ARRIVING AT THE FACILITY



# Let's Start with a Few Basic Assumptions

You thoroughly  
validated your test  
system prior to shipping

You've verified that the  
shipment arrived at the  
facility

# Let's be Very Clear

Each of the available heavy ion facilities are different

- Kinetic energies
- Ions available
- Beam control (flux, etc...) and reporting
- Vacuum vs open-air test fixture (and time to change DUT)
- Beam structure
- Mounting/Cabling
- Ion/energy tune capability and time to change
- Target room interlock systems and time to enter/exit
- And so on...

It is incumbent on the test team to be familiar with the chosen test facility (and their resources)

- **A pre-test visit is **HIGHLY** recommended**

# Partial List of Test Facility Considerations

- **Particle**

- Test energies/ions
- Dosimetry/particle detectors/modeling codes
- Uniformity
- Particle range
- Spot size/collimation
- Test levels
  - » Flux and fluence rates
  - » Beam stability
- Particle localization
- Beam structure – pulsed vs continuous
- Secondary particles

- **Practical**

- **Technical**

- » Mechanical/mounting
- » Cabling/feedthroughs
  - Ethernet, Wi-Fi,...
- » Power
- » Ancillary test equipment location (in vault or user area)
- » Test specific issues
  - Thermal
  - Speed/performance
  - Test conditions

- **Logistics**

- » Contracts/purchase
- » Safety rules (patients first)
  - Personal dosimeters?
- » Shipping/receiving
- » Staging/user areas
- » Operator model
- » Activated material storage

# Pointers at the Test Site

- Staging area
  - Unpack the equipment and verify that it worked just like it did “back in the lab”. Use the same cabling/hardware wherever possible.
- Make the test system as easy to install in the test area as reasonable
  - Most facilities will charge you for your install/deinstall time as part of your time block (in other words, don’t take 4 hours out of a 12-hour timeslot to install and verify, then remove)
  - Remember that you usually have to be out of the test area when your timeslot ends (there’s usually customers following you)





# A BIT ON THE BEAM SELECTION

# Kinetic Energy Matters – Trade Space

- The two prime things we care about are
  - **Penetration range (testability) - Y-axis**
  - **LET coverage – X-axis**
- The figure at right shows the trade space that higher energy (penetration) equates to lower LETs

**What is the LET after passing through the device at the active region?**  
**Stackup tools (SRIM, SEUSS, ...) are usually available, but you need material overlayer information**



# So, Let's Look at the Most Popular "Tune" at TAMU

- This is the same sort of graph for the 15 MeV/amu (or  $u$ ) selection of ions at TAMU K500.
  - One can see that the LET varies as the ion penetrates the material (surface entry points are on the right of curves)
  - Using this type of graph or table allows selection of ions (at normal incidence) that will meet your LET test needs



Courtesy Clark/TAMU





**WASTE NOT WANT NOT  
(OR MINIMIZE “BEAM OFF” TIME)**

# Here's Where We Discuss Maximizing the ROI

---

- Remember
  - Beam time is expensive and hard to get
  - The higher the percentage of time the beam is on the DUT, the higher the ROI (usually)
- This section
  - Test Board
  - Test Set
  - Test Performance and Beam/Parameter Changes
  - Enter/Exit the Chamber



# How Big Is Your Beam?

- Standard SEE Testing

- Irradiate one\* device at a time (other devices are either other DUTs or support)



- Batch Device Irradiation

- Irradiate large number of devices simultaneously



- System Irradiation

- Irradiate entire card or system simultaneously



\*Actually, depending on spot/DUT size, it may take multiple beam positions for full coverage

*Image courtesy of Vanderbilt University*

**We'll focus on Standard SEE Testing (TAMU), but if you can test more than one DUT at a time...**

# Test Board

- Given that most facilities have remote positioning and “stored” DUT locations available, consider having more than 1 DUT on the test board.
- Yes, it adds test system complexity, but it can increase “on” time
  - Test DUT 2 while data from DUT 1 data is being stored/Test System 1 is being reinitialized for DUT 1’s next run (assumes independence between the Test System 1/DUT 1 and Test System 2/DUT 2).
  - Saves on entry/exit times for swapping DUTs or samples



# The Equipment to Test the DUT

- Again – we’re not telling you what equipment to use, but some things to think about with the test set/system
  - Remember: the test conductor (you) sits in a remote location from the DUT in the beam
- Design for event observability and capture, but consider
  - Remote control features of equipment in the test area (example, not going into the chamber to power cycle or push a reset button!)
  - The amount of time it takes after a test run to be ready for the **next** test run (store data, decide on next run, configure, verify, and go!)

**It's a trade space for efficiency of test system operation versus verification of reliable operation: you make the decision on the correct amount of each**

# The Bigger Trade Space for Operations

- As stated, each facility is different and the amount of time varies significantly to:
  - Change ions
  - Change energy or tune energy
  - Change angles
  - Change beam flux
  - Reposition
  - Control start/stop of the beam (is there an electronic means?)
  - Enter/exit the chamber (change DUTs/samples), and so on...
- For example:
  - Knowing the facility capabilities allows proper study of best means of changing LET values (ion change vs energy vs angles, for example) or the trade space for entering the chamber to change samples versus moving to another ion

**Caveat: facility times are “typical” times – your mileage may vary (YMMV).**

**BTW, did you check on what ion or tune the previous group ended with? Can you start there? ☺**

# Example Ion Change Times – TAMU K500

| Estimated Beam Change Times for 15 A MeV Beams |                   |    |    |     |     |    |    |     |    |    |    |    |    |    |
|------------------------------------------------|-------------------|----|----|-----|-----|----|----|-----|----|----|----|----|----|----|
| Final 15 A MeV Beam Configuration              | <sup>4</sup> He   | 45 | 55 | 55  | 55  | 55 | 55 | 55  | 55 | 70 | 70 | 70 | 60 | 60 |
|                                                | <sup>14</sup> N   | 40 |    | 40  | 40  | 40 | 40 | 40  | 40 | 50 | 50 | 50 | 50 | 50 |
|                                                | <sup>20</sup> Ne  | 50 | 40 |     | 25* | 35 | 30 | 30  | 30 | 45 | 45 | 45 | 50 | 50 |
|                                                | <sup>40</sup> Ar  | 50 | 40 | 25* |     | 35 | 30 | 30  | 30 | 45 | 45 | 45 | 50 | 50 |
|                                                | <sup>63</sup> Cu  | 50 | 40 | 35  | 35  |    | 35 | 35  | 35 | 45 | 45 | 45 | 50 | 50 |
|                                                | <sup>84</sup> Kr  | 50 | 40 | 30  | 30  | 35 |    | 25* | 25 | 30 | 45 | 45 | 45 | 50 |
|                                                | <sup>109</sup> Ag | 50 | 40 | 30  | 30* | 35 | 15 |     | 15 | 30 | 45 | 45 | 45 | 50 |
|                                                | <sup>129</sup> Xe | 50 | 40 | 30  | 30  | 35 | 25 | 25* |    | 30 | 45 | 45 | 45 | 50 |
|                                                | <sup>141</sup> Pr | 50 | 40 | 25  | 20* | 35 | 25 | 25  | 25 |    | 45 | 45 | 45 | 50 |
|                                                | <sup>165</sup> Ho | 60 | 50 | 45  | 45  | 45 | 45 | 45  | 45 |    | 30 | 30 | 65 | 65 |
|                                                | <sup>181</sup> Ta | 60 | 50 | 45  | 45  | 45 | 45 | 45  | 45 | 30 |    | 20 | 65 | 65 |
|                                                | <sup>197</sup> Au | 60 | 55 | 50  | 50  | 50 | 50 | 50  | 50 | 30 | 20 |    | 70 | 70 |
| Initial 15 A MeV Beam Configuration            |                   |    |    |     |     |    |    |     |    |    |    |    |    |    |
| w/Energy Changes                               |                   |    |    |     |     |    |    |     |    |    |    |    |    |    |

Courtesy Clark/TAMU

# WHEN THINGS GO ROTTEN



# First Rule of Test Club:

## The Test Plan Changes as soon as the Test Starts

- In other words, flexibility is needed to make decisions
  - If a test set/DUT isn't working or needs modification, how long do you spend trying to debug the situation?
    - » 30 minutes?, 1 hour?...
    - » Time = money.
  - It's best to discuss this in advance and decide what to do if/when things don't work (or you need to modify the test system to accommodate something unexpected).
- **Always (and I mean always) have a backup test (or three)** available if debug is taking too long or the test finishes early (hey, it does happen when device responds worse (SEL at low LET) or better (fewer ions than expected) than anticipated).

# THUMB WARS



# Rules of Thumb 1 – Creating a Spreadsheet

- **This is about things to consider**
  - **Add or subtract variables as you see fit**
  - **Time per test run can vary widely depending on DUT, test goals, response, etc.: seconds to many minutes. Think about your data collection system and the data needed.**

| Item                                                | Time to perform/ change | Notes                                                                                                                   |
|-----------------------------------------------------|-------------------------|-------------------------------------------------------------------------------------------------------------------------|
| Test set install/removal                            |                         |                                                                                                                         |
| DUTs                                                |                         | DUT Types                                                                                                               |
| Samples                                             |                         | Samples of each DUT type                                                                                                |
| DUT Operating Conditions                            |                         | Bias, frequency, applications/loading, voltages, temperature, etc...                                                    |
| Ions                                                |                         |                                                                                                                         |
| Angles per ion                                      |                         |                                                                                                                         |
| Energies per ion                                    |                         |                                                                                                                         |
| Test runs per sample per ion/angle/energy condition |                         |                                                                                                                         |
| # of test runs                                      |                         | Multiply Amounts (DUTs to test runs per...)                                                                             |
| Average time per test run                           |                         | Include time to prepare test system for NEXT test run                                                                   |
| Total time (ideal)                                  |                         | Sum of (# test runs x avg time per run), time to change DUTs, samples, DUT operating conditions, ions, angles, energies |
| Additional margin                                   |                         | Slack for debug or ?                                                                                                    |

# Rules of Thumb 2: Figures of Merit - an Opinion Based on Experience (YMMV)



***Just remember the radiation engineer's mantra: IT DEPENDS!***

# And Finally...

## Estimating SEE Test Time has a Wide Array of Variables

- This talk provided some, but you may have others to add
- Obtaining beam access is competitive, so efficiency is a must

Experience and working with someone experienced is not a cure-all, but really does help

- Understand that even someone experienced is not an expert in all types of devices and SEE testing (widebandgap power vs SOC, for example)

Feel free to reach out if you have any questions

# Acronyms

- Atomic Mass Unit (amu)
- Brookhaven National Laboratories (BNL)
- Blue Screen Of Death (BSOD)
- By The Way (BTW)
- Central Processing Unit (CPU)
- Dual-Inline Package (DIP)
- Device Under Test (DUT)
- Delta Voltage (dV)
- Input/Output (I/O)
- Integrated Circuits (ICs)
- Intellectual Property
- Linear Energy Transfer (LET)
- Mega Electron Volts (MeV)
- Micron (um)
- NASA Space Radiation Laboratory (NSRL)
- Radiation Lot Acceptance Test (RLAT)
- Random Access Memory (RAM)
- Return on Investment (ROI)
- Single Event Effects (SEE)
- Single Event Functional Interrupt (SEFI)
- Single Event Transient (SET)
- Single Event Upset (SEU)
- TAMU Control Software (SEUSS)
- System On A CHIP (SOC)
- Stopping and Range of Ions in Matter (SRIM)
- Texas A&M University (TAMU)
- Voltage-drain (Vdd)
- Your Mileage May Vary (YMMV)